Introductions


Learning objectives:


The package ggplot2 has many options and capabilities so you will probably find the following ggplot2 help resources useful:

  1. http://docs.ggplot2.org/current/
  2. the graphics portion of the R cookbook site at http://www.cookbook-r.com/Graphs/
  3. http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
  4. RStudio “Data Visualization with ggplot2” cheatsheet
  5. Google
library(ggplot2)

We will again use the gapminder data. To read the data into R and remind yourself of the structure use the following code. Make sure the .Rdata file is in your working directory.

load("gapminder.Rdata")
str(gapminder)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Our initial focus will be on scatter plots. Next class we will learn about other types of graphical capabilities in ggplot2. You will first see how certain aesthetics and modifications can be made with ggplot2. After, you will apply what you learned to recreate some plots. ***

Scatter Plots - Gapminder data

First, we draw a simple scatter plot of life expectancy versus GDP.

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp))

*** The form of the scatter plot suggests that a log transformation might be helpful. One possibility is to include the transformation in the initial aes specification.

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = log10(gdpPercap), y = lifeExp))

*** This looks okay, but the scale on the x axis is now in “log” units. It might be better to use the original units.

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  scale_x_log10()

*** An alert viewer of the graph will notice that the distance between 1000 and 10000 is the same as the distance between 10000 and 100000 and realize that the percapita GDP has been log scaled. But it doesn’t hurt to indicate this explicitly by changing the label on the x axis. As with many things in R, there are several ways to do this.

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  scale_x_log10(name = "per capita GDP (log10 scaled)")

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  scale_x_log10() +
  xlab("per capita GDP (log10 scaled)")

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) + 
  scale_x_log10() +
  labs("per capita GDP (log10 scaled)")

All are fine ways to change the x axis label. Let’s use the third method, also change the y axis label, and save this part of the graphic specification so we don’t have to keep retyping it. Note that aes(x = gdpPercap, y = lifeExp) is now inside ggplot. The result is the same, but now we can add point aesthetics without having to change the base of our plot.

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +  
  scale_x_log10() + 
  labs(x = "per capita GDP (log10 scaled)", y = "life expectancy")

p + geom_point()

*** Using p from above as a starting point, produce each of the plots below.


Plot a

p + geom_point(aes(color = continent))

* Plot b**

p + geom_point(aes(color = continent)) +
  labs(title = "Plot with least square line") +
  stat_smooth(method = lm, se=FALSE)

* Plot c**

p + geom_point(aes(color = continent, shape = continent)) +
  labs(title = "Plot with different shapes and colors") +
  stat_smooth(method = lm, se=FALSE)

***

Scatter Plots - Diamonds data

The ggplot2 package comes with a data set called diamonds. Let’s look at it below. To obtain further details type ?diamonds in your console window.

str(diamonds)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Produce each of the following plots. Plot a

pdiamnond <- ggplot(data = diamonds, mapping = aes(x = carat, y = price))  
pdiamnond + geom_point()

***

Plot b

pdiamnond + geom_point(aes(color = color)) +
  labs(title = "Diamond Carats vs Price",
       subtitle = "data from ggplot2",
        y = "Price($)",
       x = "Carat")


Plot c

pdiamnond + geom_point(aes(color = color,
                           shape = cut)) +
  labs(title = "Diamond Carats vs Price",
       subtitle = "data from ggplot2",
        y = "Price($)",
       x = "Carat")
## Warning: Using shapes for an ordinal variable is not advised


Plot d

pdiamnond + geom_point(aes(color = color)) +
  labs(title = "Diamond Carats vs Price",
       subtitle = "data from ggplot2",
       y = "Price($)",
       x = "Carat") +
  facet_wrap(~factor(cut))

Plot e

pdiamnond + geom_point(aes(color = cut)) +
  labs(title = "Diamond Carats vs Price",
       subtitle = "data from ggplot2",
        y = "Price($)",
       x = "Carat") +
  facet_wrap(~factor(clarity))